0%

Introduction to Time Series Analysis - 01

Introduction to Time Series Analysis - 01

This note is for course MATH 545 at McGill University.

Lecture 1 - Lecture 3

Reference Book

  • Introduction to Time Series and Forecasting (by Brockwell and Davis)
  • The Analysis of Time Series: an Introduction with R (by Chatfield and Xing)

Time series

{Xt}\{X_t\} is a collection of random variables where \(t\) is the index of time.

The process of dealing with time series
  1. Describe by plotting to have concise summary of data
  2. Explain by probabilistic models (joint distributions)
  3. Predict to attain more uncertainty

Note that XtX_t is mutual independent, so we have the joint distribution Pr(X1x1,X2x2,...,Xnxn)=i=1nPr(Xixi)Pr(X_1 \leq x_1, X_2 \leq x_2, ..., X_n \leq x_n) = \prod_{i=1}^n Pr(X_i \leq x_i). But for most complex models we assume that Pr(X1x1,X2x2,...,Xnxn)=Pr(X1x1)Pr(X2x2X1x1)...Pr(XnxnX1x1...Xn1xn1)Pr(X_1 \leq x_1, X_2 \leq x_2, ..., X_n \leq x_n) = \\ Pr(X_1 \leq x_1)Pr(X_2 \leq x_2 | X_1 \leq x_1) ... Pr(X_n \leq x_n|X_1 \leq x_1 ... X_{n-1} \leq x_{n-1}).

Semi-parametric model

In semi-parametric models, we do not specify pdf and cdf of random variables, instead we specify E(Xt)E(X_t) and Cov(Xt,Xt+j)Cov(X_t, X_{t+j}).

Examples
  1. iid noise: let E(Xt)=0,tE(X_t)=0, \forall t and Pr(X1x1,X2x2,...,Xnxn)=i=1nPr(Xixi)=i=1nF(Xn)Pr(X_1 \leq x_1, X_2 \leq x_2, ..., X_n \leq x_n) = \prod_{i=1}^n Pr(X_i \leq x_i) = \prod_{i=1}^n F(X_n) where F()F(\cdot) is cumulative distribution function.
  2. random walk: let {Xt}\{X_t\} be iid noise, and St=X1+X2+...+XtS_t = X_1 + X_2 + ... + X_t. Here StS_t is a random walk. (Note that here StS_t are not independent, but E(St)=0E(S_t)=0)
Models with structures

Let {Yt}\{Y_t\} be a time series where E(Yt)=0,tE(Y_t)=0, \forall t. Let Xt=mt+YtX_t=m_t + Y_t, where mtm_t is a slowly changing function of time. (Note that YtY_t here is what makes E(Xt)=0E(X_t)=0, but mtm_t here is what makes E(Xt)0E(X_t) \neq 0)

Some choices for mtm_t including linear function of tt and polynomial function of tt.

Models with seasonal variation (periodicity)

Let Xt=St+YtX_t = S_t + Y_t, where E(Yt)=0,tE(Y_t)=0, \forall t and StS_t is a periodic function with period dd (i.e. Std=StS_{t-d}=S_t).

Common choices for StS_t including sum of harmonic functions St=a0+j=1k(ajcos(λjt)+bjsin(λjt))S_t = a_0 + \sum^k_{j=1}(a_j \cos(\lambda_j t) + b_j\sin(\lambda_j t)), where aja_j and bjb_j are estimated, λj\lambda_j are fixed frequencies.

General strategy for analysis
  1. Plot the data to
    1. identify potential signal (trend, seasonal)
    2. identify possible models for the rsifual process
    3. identify outliers and other weird things
  2. Remove the signal
  3. Choose a model to fit the resifual and esitimate the dependence
  4. Forecast by inventory projected residuals
Why we focus on the residuals (i.e. Xtmt^,XtSt^X_t - \hat{m_t}, X_t - \hat{S_t})

Let WiiidN(μ,σ2)W_i \overset{\text{iid}}{\sim}N(\mu, \sigma^2), then we have WiμN(0,σ2)W_i - \mu \sim N(0, \sigma^2). Now we can estimate μ\mu to remove the signal, and also we can estimate σ2\sigma^2.

Stationary process (series)

Let {Xs}s=0,1,...,n\{X_s\}_{s=0, 1, ..., n} has the same properties as {Xt+s}s=0,1,...,n\{X_{t+s}\}_{s=0, 1, ..., n}. (Note that we will focus on first and second order moments). iid noise is a special case of a stationary process.

Def. XtX_t is weakly stationary if

  1. E(Xt)=μX(t)E(X_t)=\mu_X(t) is independent of tt
  2. Cov(Xr,Xs)=E((XrμX(r))(XsμX(s)))=γX(r,s)Cov(X_r, X_s)=E((X_r-\mu_X(r))(X_s-\mu_X(s)))=\gamma_X(r,s), where γX\gamma_X is the covariance function of XtX_t

We require that γX(t+h,t)\gamma_X(t+h, t) is independent of tt (i.e. γX(t+h,t)=γX(h,0)=Cov(Xh,X0)\gamma_X(t+h,t)=\gamma_X(h,0)=Cov(X_h, X_0))

Def. For strongly stationary, we require that the joint distribution of {Xs}s=0,1,...,n\{X_s\}_{s=0, 1, ..., n} is the same as {Xt+s}s=0,1,...,n\{X_{t+s}\}_{s=0, 1, ..., n}

We define γX(h,0)=γX(h)\gamma_X(h,0)=\gamma_X(h) is the auto-covariance function of a stationary series of lag hh.

We define ρX(h)\rho_X(h) is the auto-correlation function of lag hh and ρX(h)=γX(h)γX(0)=Cor(Xt+h,Xt)\rho_X(h)=\frac{\gamma_X(h)}{\gamma_X(0)}=Cor(X_{t+h},X_t)

Useful identity

If E(X2)<,E(Y2)<,E(Z2)<E(X^2) < \infty, E(Y^2) < \infty, E(Z^2) < \infty and a,b,ca,b,c are real constants, then Cov(aX+bY+c,Z)=aCov(X,Z)+bCov(Y,Z)Cov(aX+bY+c, Z) = aCov(X,Z) + bCov(Y,Z).

Example 1: iid noise

XtiidN(0,σ2)X_t \overset{\text{iid}}{\sim}N(0, \sigma^2)

By definition we have E(Xt)=0E(X_t)=0. If E(X2)=σ2<E(X^2) =\sigma^2 < \infty, then γX(h)=Cov(Xt+h,Xt)={σ2, if h=00,h0 by independence\gamma_X(h)=Cov(X_{t+h}, X_t) = \begin{cases} \sigma^2, \text{ if } h=0 \\ 0, \forall h \neq 0 \text{ by independence} \end{cases}

Therefore iid noise process is weakly stationary.

Example 2: White Noise Process

If {Xt}\{X_t\} is a sequence of uncorrelated random variables with E(Xt)=0,Var(Xt)=σ2<,γX(h)=0h0E(X_t)=0, Var(X_t)=\sigma^2 < \infty, \gamma_X(h)=0 \quad \forall h\neq 0, then we refer to it as white noise.

Note that iid noise is white noise, but white noise is not necessarily iid noise.

Example 3

Suppose {Wt}\{W_t\} and {Zt}\{Z_t\} are iid sequences, and {Wt}{Zt}\{W_t\} \bot \{Z_t\}.

Let {Wt}\{W_t\} follows a Bernoulli distribution, where Pr(Wi=0)=Pr(Wi=1)=1/2Pr(W_i=0)=Pr(W_i=1)=1/2.

Let {Zt}\{Z_t\} follows a transformed Bernoulli distribution, where Pr(Wi=1)=Pr(Wi=1)=1/2Pr(W_i=-1)=Pr(W_i=1)=1/2.

Set Xt=Wt(1Wt1)ZtX_t=W_t(1-W_{t-1})Z_t, and we have the value table of XtX_t as follows:

Wt1W_{t-1} WtW_t XtX_t
1 0 0
1 1 0
0 0 0
0 1 ZtZ_t

E(Xt)=E(Wt)E(Wt1)E(Zt)=12×12×0=0E(X_t)=E(W_t)E(W_{t-1})E(Z_t)=\frac{1}{2} \times \frac{1}{2} \times 0 = 0

When calculating covariance, there are two cases:

  1. h=0h=0

Cov(Xt,Xt+h)=E(XtXt+h)=E(Wt2(1Wt1)2Zt2)=E(Wt2)E((1Wt1)2)E(Zt2)=12×12×1=14Cov(X_t,X_{t+h})=E(X_t X_{t+h})=E(W_t^2(1-W_{t-1})^2Z_t^2)\\=E(W_t^2)E((1-W_{t-1})^2)E(Z_t^2)=\frac{1}{2} \times \frac{1}{2} \times 1 = \frac{1}{4}

  1. h0h\neq 0

Cov(Xt,Xt+h)=E(XtXt+h)=E(Wt(1Wt1)ZtWt+h(1Wt+h1)Zt+h)=E(Wt)E((1Wt1))E(Zt)E(Wt+h)E((1Wt+h1))E(Zt+h)=0Cov(X_t,X_{t+h})=E(X_t X_{t+h})=E(W_t(1-W_{t-1})Z_tW_{t+h}(1-W_{t+h-1})Z_{t+h})\\=E(W_t)E((1-W_{t-1}))E(Z_t)E(W_{t+h})E((1-W_{t+h-1}))E(Z_{t+h})=0

Therefore, XtX_t is a white noise process.

Note that XtX_t and Xt1X_{t-1} are dependent but not correlated.

Example 4: Random walk

Let {Xt}\{X_t\} be iid noise, and St=X1+...+Xt=i=1tXiS_t = X_1 + ... + X_t = \sum^t_{i=1} X_i.

We have E(St)=0E(S_t)=0, and Var(St)=tσ2Var(S_t)=t\sigma^2.

Cov(St+h,St)=Cov(St+[Xt+1+...+Xt+h],St)=Cov(St,St)+Cov(Xt+1+...+Xt+h,St)=tσ2+0Cov(S_{t+h}, S_t)=Cov(S_t+[X_{t+1}+...+X_{t+h}],S_t) \\= Cov(S_t,S_t) + Cov(X_{t+1}+...+X_{t+h},S_t) = t\sigma^2 + 0

Therefore, random walk is not stationary.

Example 5: First order moving average process (MA(1))

Let ZtWN(0,σ2)Z_t \sim WN(0, \sigma^2). Let Xt=Zt+θZt1,t=0,±1,±2,...X_t=Z_t + \theta Z_{t-1},\quad t=0,\pm1,\pm2,... where θ\theta is a real-valued constant.

(Graphical representation will be added later)

We have E(Xt)=E(Zt)+θE(Zt1)=0E(X_t)=E(Z_t) + \theta E(Z_{t-1}) = 0.

Var(Xt)=E(Xt2)=E((Zt+θZt1)2)=E(Zt2)+2θE(ZtZt1)+θ2E(Zt12)=(1+θ2)σ2Var(X_t) = E(X_t^2) = E((Z_t + \theta Z_{t-1})^2) \\=E(Z_t^2) + 2\theta E(Z_tZ_{t-1}) + \theta^2E(Z_{t-1}^2) = (1+\theta^2)\sigma^2

When calculating covariance, there are three cases:

  1. h=0h=0

γX(t+h,t)=E(Xt+hXt)=E(Xt2)=(1+θ2)σ2\gamma_X(t+h,t)=E(X_{t+h}X_t)=E(X_t^2)=(1+\theta^2)\sigma^2

  1. h=±1h=\pm1

γX(t+h,t)=E(Xt+hXt)=E((Zt+1+θZt)(Zt+θZt1))=E(Zt+1Zt)+θE(Zt2)+θE(Zt+1Zt1)+θ2E(ZtZt1)=θσ2\gamma_X(t+h,t)=E(X_{t+h}X_t)=E((Z_{t+1}+\theta Z_t)(Z_t+\theta Z_{t-1}))\\=E(Z_{t+1}Z_t)+\theta E(Z_t^2)+\theta E(Z_{t+1}Z_{t-1})+\theta^2 E(Z_tZ_{t-1})=\theta\sigma^2

  1. h>1|h|>1

γX(t+h,t)=E(Xt+hXt)=E((Zt+h+θZt+h1)(Zt+θZt1))=0\gamma_X(t+h,t)=E(X_{t+h}X_t)=E((Z_{t+h}+\theta Z_{t+h-1})(Z_t+\theta Z_{t-1}))=0 becase tt1t+ht+h1t\neq t-1 \neq t+h \neq t+h-1 if h>1|h|>1

Therefore, MA(1) is stationary, and ρX(h)={1,h=0θ1+θ2,h=±10,h>1\rho_X(h)=\begin{cases} 1, \quad h=0 \\ \frac{\theta}{1+\theta^2}, \quad h=\pm1 \\ 0, \quad |h|>1 \end{cases}